Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore missing config.v2.json files on migration #303

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lmbarros
Copy link
Contributor

@lmbarros lmbarros commented Jul 14, 2022

TODO: One thing we should do in this PR: if every attempt of making the migration fails, simply remove all images and containers and let the device redownload everything. (This is actually how we designed the migration, but not the observed behavior.)

We have seen some cases in which the file
/var/lib/docker/containers/<UUID>/config.v2.json was not found for
certain UUIDs. This causes the whole migration to fail and, worse,
causes our migration cleanup code to fail.

A missing config.v2.json indicates that directory does not contain a
valid container, so we should not even try to migrate them. That's what
this commit does: it makes the migration ignore directories with a
missing config.v2.json.

I don't think we should have directories like this in the first place,
this is probably the side effect of some other issue. That said, Docker
itself ignores these directories (after logging a warning) during
startup, so here are just bringing the Engine in line with the standard
behavior.

Resolves #301

- What I did

- How I did it

- How to verify it

- Description for the changelog

@lmbarros lmbarros self-assigned this Jul 14, 2022
@ghost
Copy link

ghost commented Jul 14, 2022

An error occurred whilst building your landr site preview:

{
  "name": "Error",
  "message": "Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build",
  "stack": "Error: Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build\n    at Object.exports.run (/usr/src/app/lib/build-runner.js:257:11)\n    at async build (/usr/src/app/bot/index.js:132:19)\n    at async /usr/src/app/bot/index.js:210:25\n    at async Promise.all (index 0)\n    at async middleware (/usr/src/app/node_modules/@octokit/webhooks/dist-node/index.js:355:5)"
}

@lmbarros
Copy link
Contributor Author

Totally untested as of now, therefore keeping PR as a draft.

@jellyfish-bot
Copy link

[alexgg] This has attached https://jel.ly.fish/9aaf5f96-265c-4c52-9e59-074e973b53f8

@zoobot
Copy link

zoobot commented Nov 12, 2022

I added a test and some error handling but not quite ready to merge this as my tests aren't working when I run individual tests.

The migrate_test.go tests pass when I run all the tests: make test-unit

But these all fail with package errors and GOROOT/GOPATH errors so I need to sort that out.
make test-unit TESTDIRS=pkg
make test-unit TESTDIRS=pkg/storagemigration
make test-unit TESTDIRS=pkg/storagemigration/migrate_test.go

@zoobot zoobot self-assigned this Nov 12, 2022
@zoobot zoobot linked an issue Nov 12, 2022 that may be closed by this pull request
@lmbarros
Copy link
Contributor Author

Hey @zoobot ! I think you need to include the ./ prefix:

make test-unit TESTDIRS=./pkg/storagemigration

@zoobot
Copy link

zoobot commented Nov 14, 2022

That . worked! :) thanks @lmbarros

@zoobot
Copy link

zoobot commented Nov 14, 2022

@zoobot
Copy link

zoobot commented Nov 21, 2022

Update on this @lmbarros :

Was hoping to get a rpi3 to test on, Ross is sending one.
Also trying to set up QEMU to test this on for quicker results, sorting out m1 QEMU issues.

Making changes today:
Don't just delete single container layer that has missing config.v2.json

On migration failure due to missing config.v2.json:
Remove all overlay2 layers, aufs layers remain

We have seen some cases in which the file
`/var/lib/docker/containers/<UUID>/config.v2.json` was not found for
certain UUIDs. This causes the whole migration to fail and, worse,
causes our migration cleanup code to fail.

A missing `config.v2.json` indicates that directory does not contain a
valid container, so we should not even try to migrate them. That's what
this commit does: it makes the migration ignore directories with a
missing `config.v2.json`.

I don't think we should have directories like this in the first place,
this is probably the side effect of some other issue. That said, Docker
itself ignores these directories (after logging a warning) during
startup, so here are just bringing the Engine in line with the standard
behavior.

Signed-off-by: Leandro Motta Barros <leandro@balena.io>
Change-type: patch
@zoobot zoobot force-pushed the lmb/dont-migrate-bad-containers branch 3 times, most recently from 8162043 to b71b5df Compare November 21, 2022 23:56
@zoobot
Copy link

zoobot commented Nov 21, 2022

@lmbarros
I made the changes to test and cleanup so it removes all the overlay2 folders/containers on rollback, instead of just removing the single layer with the missing config.v2.json.

Should we wait until I have a RP3 device or QEMU to test this on or roll it out with the current go test?
Do you always test on devices first?

Thanks for looking

@@ -36,5 +31,23 @@ func failCleanup(root string) error {
return err
}

err = SwitchAllContainersStorageDriver(root, "aufs")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of the removeDirIfExists() calls above fails, we'd return before reaching this call here, and I believe this could leave the device in an inconsistent state. (Specifically: if any containers were successfully migrated, they'd be tagged as being overlay2 containers, but the corresponding overlay2 data might have been deleted.)

return nil
}

func failCleanupContainer(root string, containerID string) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not being currently called anywhere.

@lmbarros
Copy link
Contributor Author

Also there's a typo in the changelog entry: "Rollback to ausf..."

@lmbarros
Copy link
Contributor Author

Should we wait until I have a RP3 device or QEMU to test this on or roll it out with the current go test?
Do you always test on devices first?

I often do this because I am paranoid and like the ritual, but no, not necessarily. In fact, I think it's generally more important to have a good automated test case (either here or in meta-balena) capable of ensuring that we really fixed the issue.

Add test for missing config.v2.json

Changelog-entry: Rollback to aufs and cleanup overlay2 in case of missing config.v2.json
Change-type: patch
Signed-off-by: zoobot <rommims@gmail.com>
@zoobot zoobot force-pushed the lmb/dont-migrate-bad-containers branch from b71b5df to 3915950 Compare November 22, 2022 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aufs to overlay2 migration failure
3 participants